The purpose of this lab is to continue learning a journalistic approach to data analysis in R.
We will continue to do things learned in previous labs:
Today, we’ll also learn:
This document is mostly set up for you to follow along and run code that I have written, and listen to me explain it.
At several points throughout this document, you will see the word Task.
That indicates I’m expecting you to modify the file I’ve given you, usually by creating a codeblock and writing some custom code.
When you are finished, you should save your R markdown file and Knit it as an HTML file.
You should upload it to GitHub, using GitHub desktop.
And the links to your project is what you’ll post on ELMS.
Need help? You are welcome to do the following things:
Take the following steps to set up your document:
We’re loading seven packages today. five of these we’ve loaded previously: the Tidyverse (for general data science goodness and visualizing charts and maps), janitor (for data cleaning), arcos (for loading WaPo opioid data) and tidycensus (for loading census data) and scales for cleaning up axis labels and legends.
We’re also going to load two new packages: mapview (for making interactive maps) and ggthemes (for doing cool styling stuff).
Task: In the code block below, load the packages we’ll need for today.
# Load Tidyverse, janitor and arcos, tidycensus, mapview, ggthemes, scales
#install.packages("tidycensus")
#install.packages("mapview")
#install.packages("ggthemes")
##install.packages("scales")
#install.packages("tidyverse")
library(janitor)
library(arcos)
library(tidyverse)
library(tidycensus)
library(mapview)
library(ggthemes)
library(scales)
For this exercise, we will be working with subsets of the DEA’s ARCOS database, which documented shipments of 76 billion opioid pills between 2006 and 2012, during the peak of the opioid epidemic.
The data was obtained after a lengthy legal battle by the Washington Post and the Charleston Gazette-Mail, and released by the Washington Post in raw and aggregated form. Washington Post “Digging into the DEA’s pain pill database” page.
A data dictionary is available here: ARCOS Registrant Handbook.
We’re going to load the data exclusively from the arcos R package API ARCOS API produced by the Washington Post, instead of uploading csvs and tsvs.
Remember, we need to store a password of sorts – called an API key – that will give us permission to access their data. Here’s a list of API keys that will work.
Let’s store the key first.
# store one of our API keys as an object called key
key <- "uO4EK6I"
#remove(county_population_per_year)
#remove(county_median_household_income)
#remove(county_pills_per_year)
#remove(acs_variables)
#remove(arcos_county_pills_per_year)
#remove(baltimore_city_pills_per_year)
#remove(logan_county_per_year)
#remove(m90)
#remove(md_county_median_household_income)
#remove(md_pills_by_county)
#remove(mingo)
#remove(pills_population)
#remove(pills_population_full)
#remove(pills_population_left)
#remove(pills_population_right)
#remove(wv_pills_per_year)
Our goal today will be to build several maps that show which counties and states received more pills than other parts of the country, adjusted for population. So, to do that, we’ll need to acquire both pill shipment data and population data, then calculate the pills per person statistic.
The ARCOS API has a table with total pills for each county for each year between 2006 and 2012. Let’s pull that down and clean and standardize the column headers with clean_names() . Remember to include the key.
arcos_county_pills_per_year <- summarized_county_annual(key = key) %>%
clean_names()
The ARCOS API also has a table with county population estimates for each year between 2006 and 2012. Let’s pull that down and clean and standardize the names. Remember to include the key.
arcos_county_population_per_year <- county_population(key = key) %>%
clean_names()
As we saw in previous labs, there’s an inconsistent number of records between the two tables. This is because there are some counties in the population table that aren’t in the pills table, and vice versa. We can see what these are using the anti_join() function from the Tidyverse. Note: we need to join by the countyfips code and year.
# 999 records in our population table without a countyfips+year match in our pills table
not_in_pills <- arcos_county_population_per_year %>%
anti_join(arcos_county_pills_per_year, by=c("countyfips","year"))
# 595 records in our pills table without a countyfips+year match in population table.
not_in_population <- arcos_county_pills_per_year %>%
anti_join(arcos_county_population_per_year, by=c("countyfips","year"))
Task: Examine the not_in_population table. Write a few sentences explaining patterns you see in which records show up in the pills table, but not in population table. How could this cause problems for you in the future?
# The most common reason that records show up in the pills table but not the population table is territories of the US that were counted in the ARCOS database but were not counted in the census. Puerto Rico, Guam, Virgin Islands. However, there are some shipments to the US where the county name was not logged, so it does not have a county fips code. This could cause issues since certain places recieved more pills than we will be counting because the name of the county was not recorded.
Okay, let’s put them together. Remember we need to join them on both countyfips and year. We’re also going to throw in buyer_county and buyer_state, which exist in both tables, so we don’t get the buyer_county.x and buyer_county.y weirdness.
We’re going to inner join here with population on the left side, because we want a consistent set of U.S. counties for when we do mapping later. And we’re going to make a judgment call that it’s okay to exclude pill shipments for which there were no assigned counties, which is what we’re doing when we do the left join.
pills_population <- arcos_county_population_per_year %>%
left_join(arcos_county_pills_per_year, by = c("countyfips", "year", "buyer_county","buyer_state"))
Now we’re going to clean up our data set a bit by first creating a new column with the state code, by taking the first two digits from our countyfips column, using str_sub() from the stringr package. We’ll need this to map states later on.
pills_population <- pills_population %>%
mutate(statefips = str_sub(countyfips, 1,2)) %>% #take the county fips column and take characters starting in position 1 and ending in position 2
select(statefips, countyfips, buyer_county, buyer_state,year, population,dosage_unit)
Good. Now we have a nice table we can use to build our maps.
Before we do that, we’ll need to get some shapefiles, or collections of coordinates to draw the map, be it states or counties.
There are lots of places to get shapefiles for all sorts of geographies. The U.S. Census is one of the best! They have a repository called TIGER where you can get a lot. Luckily for us, the tidycensus package allows us to pull in this data directly.
First, we need to store our API Key. You can use mine for now, but best to get your own.
# Define API Key
census_api_key("549950d36c22ff16455fe196bbbd01d63cfbe6cf")
# If you need to look up variables, this is how you do it
acs_variables <- load_variables(2017, "acs5" )
The tidycensus allows us to pull in information from the census by many different geographies – and pull in the “geometry” for the geographic units we define with it.
We’re going to pull in four sets of data here.
First, lets pull in county and state. The “variable” here is just a count of total population, but it doesn’t really matter. We just want the geometry, which we get by selecting geometry=TRUE.
#feed in variable from acs_variables
county_geodata <- get_acs(geography = "county",
variables = "B01001_001", geometry = TRUE)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|=========================================================== | 92%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
state_geodata <- get_acs(geography = "state",
variables = "B01001_001", geometry = TRUE)
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 60%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 83%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 100%
Let’s take a look at these files. Warning, if you’re working on a slow machine, don’t view it, as it may crash your R instance. Just watch the video. The last column has geometry information, a collection of lat and long points that form a polygon shape.
We’re also going to pull down “shifted” geometry, because it will help us make better maps. Instead of putting Hawaii and Alaska in their usual spots on the globe – which makes the map unncessarily large – it moves them into a spot right near the continental U.S. You’ve seen maps like this. I’ll show you an example in a second.
#the shift_geo will take AL and HI and position them in a place that looks good and a size that looks good
county_geodata_shifted <- get_acs(geography = "county",
variables = "B01001_001", geometry = TRUE, shift_geo = TRUE)
state_geodata_shifted <- get_acs(geography = "state",
variables = "B01001_001", geometry = TRUE, shift_geo = TRUE)
Let’s take a look at the columns in our shapefiles. We can View, but if you’re computer is too slow to View, run the code below. Unfortunately, we can’t use glimpse() or summary() on the file now that it contains geo information, for some reason.
print(head(state_geodata_shifted))
## Simple feature collection with 6 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -2031905 ymin: -1470717 xmax: 2295505 ymax: 67481.2
## epsg (SRID): NA
## proj4string: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## GEOID NAME variable estimate moe
## 1 04 Arizona B01001_001 6809946 NA
## 2 05 Arkansas B01001_001 2977944 NA
## 3 06 California B01001_001 38982847 NA
## 4 08 Colorado B01001_001 5436519 NA
## 5 09 Connecticut B01001_001 3594478 NA
## 6 11 District of Columbia B01001_001 672391 NA
## geometry
## 1 MULTIPOLYGON (((-1111066 -8...
## 2 MULTIPOLYGON (((557903.1 -1...
## 3 MULTIPOLYGON (((-1853480 -9...
## 4 MULTIPOLYGON (((-613452.9 -...
## 5 MULTIPOLYGON (((2226838 519...
## 6 MULTIPOLYGON (((1960720 -41...
Let’s make a simple map just from our population shapefiles to see how ggplot2 treats geospatial data.
Three lines of code is all we need. First, the dataframe we’re doing this to, which contains our spatial information. Then we tell it to make a ggplot graphic, using our population information (the variable is called “estimate” in our table) to fill in the shapes with a color gradient. Then geom_sf() says “take the information in the geometry column” and draw a map!
options(scipen=999)
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf()
#fill estimate = data point from state geodata
#geom_sf = look for the geometry and fill in the shape file
# Turn off scientific notation if your legend looks weird
Nice! Okay, just like with ggplot2 in the last lab, we can keep adding stuff to our function to fix the styling. I don’t know about you, but those geographic coordinates are useless information, so we can get rid of those. Fortunately, the theme_map() function that’s part of ggthemes package will take care of it.
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf() +
theme_map()
I don’t love those state line borders. Let’s set the width (lwd) to 0.
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf(lwd = 0) +
theme_map()
Now, let’s clean up the legend title, put commas in the legend labels, and move the legend to the right.
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Population') +
scale_fill_continuous(labels = comma) +
theme(legend.position="right")
Let’s give it a title, subtitle and source.
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Population',title="California, Texas have largest populations", subtitle = "2017 population, U.S. Census", caption = "Source: U.S. Census ACS") +
scale_fill_continuous(labels = comma) +
theme(legend.position="right")
And lastly, let’s give it a cooler color scheme, viridis.
#final graphic
state_geodata_shifted %>%
ggplot(aes(fill = estimate)) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Population',title="California, Texas have largest populations", subtitle = "2017 population, U.S. Census", caption = "Source: U.S. Census ACS") +
theme(legend.position="right") +
scale_fill_viridis_c(option = "magma",labels = comma)
# Also try "plasma", "viridis", "cividis"
Note: we’re just scratching the surface here of all the options for mapping in ggplot2. Link. And there are other mapping libraries we can use.
Before we move on, I want to show you how we can create a simple interactive map we can use to further explore the data with the mapview library.
After we’ve loaded the mapview package, the function we use is called “mapview”. We feed it the name of our data set, the name of the variable we want to visualize and indicate if we want a legend. That’s pretty much it.
mapview(state_geodata_shifted, zcol = "estimate", legend = TRUE)
Okay, so now our goal is to look for regional trends in opioid shipment rates by building a state map that shows pills per person shipped per year.
Remember our pills_population table we made above? Let’s make a state table looking at state totals in 2009.
First, we filter just for 2009. Then we group by the state fips code and buyer state. Then we add up all of the pills shipped to that state in 2009 and the population for that state in 2009, creating two new columns.
pills_population_state_2009 <- pills_population %>%
filter(year==2009) %>%
group_by(statefips, buyer_state) %>%
summarise(total_pills=sum(dosage_unit, na.rm = TRUE),
total_population=sum(population, na.rm=TRUE)) %>%
mutate(pills_per_person=total_pills/total_population)
Now, let’s join it to our shapefile data for our states, using the GEOID from our shapefile and statefips, the column we created with the state code.
geo_pills_population_state_2009 <- state_geodata_shifted %>%
inner_join(pills_population_state_2009, by=c("GEOID" = "statefips"))
Okay, first, let’s build an interactive version of the map.
Task: Using the mapview() function we used above, show an interactive map that shades each state according to the pills per person rate in 2009.
# interactive map code here
mapview(geo_pills_population_state_2009, zcol = "pills_per_person" , legend = TRUE)
Now let’s plot a basic version.
geo_pills_population_state_2009 %>%
ggplot(aes(fill = pills_per_person)) +
geom_sf()
Task: stylize the map, making it look like the final population map we created in the first part of the exercise with two changes: center the legend on the bottom of the graphic, instead of at right. You might try searching for legend.position on Google. And write up a headline, subtitle and source appropriate for the chart. ’
state_geodata_shifted %>% ggplot(aes(fill = estimate)) + geom_sf(lwd = 0) + theme_map() + labs(fill=‘Population’,title=“California, Texas have largest populations”, subtitle = “2017 population, U.S. Census”, caption = “Source: U.S. Census ACS”) + theme(legend.position=“right”) + scale_fill_viridis_c(option = “magma”,labels = comma)
geo_pills_population_state_2009 %>%
ggplot(aes(fill = pills_per_person)) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Pills Per Person', title="West Virginia, Kentucky, Tennessee, South Carolina and Nevada recieved the highest number of pills per person", subtitle="2009 population and pills shipped, ARCOS and census data", caption="Source: U.S. Census ACS and ARCOS database via Washington Post") +
theme(legend.position = "bottom") +
scale_fill_viridis_c(option = "magma",labels = comma)
Let’s say we want to create a single map for each year between 2006 and 2012, to see how the rate of pills per person changed over time in each state.
First, we’ll group by statefips, buyer state and year. Then, we’ll calculate total pills and total population for each state for each year, before piping those columns into our pills per person calculation.
That will give us one pills_per_person rate per state per year.
pills_population_state <- pills_population %>%
group_by(statefips, buyer_state, year) %>%
summarise(total_pills=sum(dosage_unit, na.rm = TRUE),
total_population=sum(population, na.rm=TRUE)) %>%
mutate(pills_per_person=total_pills/total_population)
Now, let’s join it to our shapefile data for our states, using the GEOID from our shapefile and statefips, the column we created with the state code.
geo_pills_population_state <- state_geodata_shifted %>%
inner_join(pills_population_state, by=c("GEOID" = "statefips"))
And now, let’s plot it. This is the same as above, but I’ve added one line: facet_wrap(~year, nrow=2).
This says: make one chart for each year, using the year column and organize the grid into two rows.
geo_pills_population_state %>%
ggplot(aes(fill = pills_per_person)) +
facet_wrap(~year, nrow=2) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Pills per Person',title="", subtitle = "", caption = "Source: ARCOS database") +
theme(legend.position="bottom") +
scale_fill_viridis_c(option = "magma",labels = comma)
Task: copy the code above and paste it below. Modify the title and subtitle to describe the trend you identify with the map. What happens between 2006 and 2012?
# Map code here
geo_pills_population_state %>%
ggplot(aes(fill = pills_per_person)) +
facet_wrap(~year, nrow=2) +
geom_sf(lwd = 0) +
theme_map() +
labs(fill='Pills per Person',title="Pills per person in the United States increased from 2006-2012", subtitle = "South Carolina saw the biggest increase in pills per person over the 6 year period", caption = "Source: ARCOS database") +
theme(legend.position="bottom") +
scale_fill_viridis_c(option = "magma",labels = comma)
Task: Make a single INTERACTIVE map that is colored based on the number of pills per person in each county in 2009. Provide a short writeup on what you find.What trends do you see, beyond Appalachia? Note: the map will probably open in a web browser.
mapview(geo_pills_population_state_2009, zcol = “pills_per_person” , legend = TRUE)
pills_population_county_2009 <- pills_population %>%
filter(year == 2009) %>%
mutate(pills_per_person=dosage_unit/population)
geo_pill_population_county_2009 <- county_geodata_shifted %>%
inner_join(pills_population_county_2009, by=c('GEOID' = 'countyfips')) %>%
select(statefips, GEOID, buyer_county, buyer_state, year, population, dosage_unit, pills_per_person, geometry)
mapview(geo_pill_population_county_2009, zcol = "pills_per_person", legend = TRUE)
#TRENDS: counties on the west coast of the US have a higher pills per person than the rest of the west coast, specifically northern cali to washington, two counties in SC (Charleston and Berkley) have significantly higher pills per person rate than anywhere else
Task: Make a static facet map, with one map per year between 2006 and 2012, showing only Maryland counties, colored based on the number of pills per person in each county in each year. Provide a short writeup on what you find. What trends do you see?
geo_pills_population_state %>% ggplot(aes(fill = pills_per_person)) + facet_wrap(~year, nrow=2) + geom_sf(lwd = 0) + theme_map() + labs(fill=‘Pills per Person’,title=“Pills per person in the United States increased from 2006-2012”, subtitle = “South Carolina saw the biggest increase in pills per person over the 6 year period”, caption = “Source: ARCOS database”) + theme(legend.position=“bottom”) + scale_fill_viridis_c(option = “magma”,labels = comma)
pills_population_MD <- pills_population %>%
filter(buyer_state == 'MD') %>%
mutate(pills_per_person=dosage_unit/population)
geo_pills_population_MD <- county_geodata_shifted %>%
inner_join(pills_population_MD, by=c('GEOID' = 'countyfips')) %>%
select(statefips, GEOID, buyer_county, buyer_state, year, population, dosage_unit, pills_per_person, geometry)
glimpse(geo_pills_population_MD)
## Observations: 168
## Variables: 9
## $ statefips <chr> "24", "24", "24", "24", "24", "24", "24", "24",…
## $ GEOID <chr> "24001", "24001", "24001", "24001", "24001", "2…
## $ buyer_county <chr> "ALLEGANY", "ALLEGANY", "ALLEGANY", "ALLEGANY",…
## $ buyer_state <chr> "MD", "MD", "MD", "MD", "MD", "MD", "MD", "MD",…
## $ year <int> 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2006,…
## $ population <int> 73980, 74449, 74638, 72598, 74638, 74788, 74645…
## $ dosage_unit <dbl> 3371670, 3634570, 3954160, 4409110, 4592010, 49…
## $ pills_per_person <dbl> 45.57543, 48.81959, 52.97784, 60.73322, 61.5237…
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((1823336 -37..., MU…
geo_pills_population_MD %>%
ggplot(aes(fill = pills_per_person)) +
facet_wrap(~year, nrow=2) +
geom_sf(lwd=0) +
theme_map() +
labs(fill="Pills Per Person", title="Pills per person in the state of Maryland increased from 2006 to 2012", subtitle="There was not much change in Prince George's and Montgomery counites", caption = "Source: US Census and ARCOS database via Washington Post") +
theme(legend.position = "bottom") +
scale_fill_viridis_c(option="magma", labels = comma)
Save the R Markdown file. Knit it to HTML and make sure it compiles correctly. Upload to GitHub, as instructed. Provide links to GitHub in ELMS.